Selective Knowledge Distillation for Non-Autoregressive Neural Machine Translation
نویسندگان
چکیده
Benefiting from the sequence-level knowledge distillation, Non-Autoregressive Transformer (NAT) achieves great success in neural machine translation tasks. However, existing distillation has side effects, such as propagating errors teacher to NAT students, which may limit further improvements of models and are rarely discussed research. In this paper, we introduce selective by introducing an evaluator select NAT-friendly targets that high quality easy learn. addition, a simple yet effective progressive method boost performance. Experiment results on multiple WMT language directions several representative show our approach can realize flexible trade-off between complexity training data for models, achieving strong performances. Further analysis shows distilling only 5% raw translations help outperform its counterpart trained about 2.4 BLEU.
منابع مشابه
Non-Autoregressive Neural Machine Translation
Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitude lower latency during inference. Through knowledge distillation, the use of input token fertilities as a latent variable, and policy gradient fine-tuning, we...
متن کاملEnsemble Distillation for Neural Machine Translation
Knowledge distillation describes a method for training a student network to perform better by learning from a stronger teacher network. In this work, we run experiments with different kinds of teacher networks to enhance the translation performance of a student Neural Machine Translation (NMT) network. We demonstrate techniques based on an ensemble and a best BLEU teacher network. We also show ...
متن کاملPrior Knowledge Integration for Neural Machine Translation using Posterior Regularization
Although neural machine translation has made significant progress recently, how to integrate multiple overlapping, arbitrary prior knowledge sources remains a challenge. In this work, we propose to use posterior regularization to provide a general framework for integrating prior knowledge into neural machine translation. We represent prior knowledge sources as features in a log-linear model, wh...
متن کاملPre-Translation for Neural Machine Translation
Recently, the development of neural machine translation (NMT) has significantly improved the translation quality of automatic machine translation. While most sentences are more accurate and fluent than translations by statistical machine translation (SMT)-based systems, in some cases, the NMT system produces translations that have a completely different meaning. This is especially the case when...
متن کاملNeural Name Translation Improves Neural Machine Translation
In order to control computational complexity, neural machine translation (NMT) systems convert all rare words outside the vocabulary into a single unk symbol. Previous solution (Luong et al., 2015) resorts to use multiple numbered unks to learn the correspondence between source and target rare words. However, testing words unseen in the training corpus cannot be handled by this method. And it a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i11.26555